Handwritten Text Image Compression for Indic Script Document

نویسندگان

  • Smita V. Khangar
  • Latesh G. Malik
چکیده

In this paper, compression scheme is presented for Indian Language handwritten text document images. Document image compression is an active area of research. Current OCR technology is not effective for handling the handwritten text images. The proposed compression scheme deals with the handwritten gray level document in Devnagri script. The method is based on the separation of foreground and background of an image and connected component labeling. Experiments are done with handwritten images in Devnagri (Hindi and Marathi). Compression schemes are available for the printed text in Indian language. But there is little work reported towards the compression standards for handwritten text image. The results of the modules are showing good compression ratio. Hence compression of handwritten text images in Indian language is important. General Terms Image Processing, Data Compression, Experimentation, Verification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compression of Scan Digitized Handwritten Text for Indian Language Document

Document image compression is used for the speedy transmission of the data over the web. This paper deals with effective compression scheme for handwritten gray level documents in Devnagri script. The current OCR technology is not effective for handling the handwritten textual images. The proposed compression scheme is based on the separation of foreground and background of the image. Experimen...

متن کامل

Compression Method for Handwritten Document Images in Devnagri Script

Document image compression is used for speedy communication over the network. In the context of document image compression most of the work is done for printed textual images. But compression of handwritten text images, very small work is reported. The textual form of images is different from the conventional form of images. Document image analysis and compression used for preserving, storing a...

متن کامل

Word level Script Identification from Bangla and Devanagri Handwritten Texts mixed with Roman Script

India is a multi-lingual country where Roman script is often used alongside different Indic scripts in a text document. To develop a script specific handwritten Optical Character Recognition (OCR) system, it is therefore necessary to identify the scripts of handwritten text correctly. In this paper, we present a system, which automatically separates the scripts of handwritten words from a docum...

متن کامل

Convolution Based Technique for Indic Script Identification from Handwritten Document Images

Determination of script type of document image is a complex real life problem for a multi-script country like India, where 23 official languages (including English) are present and 13 different scripts are used to write them. Including English and Roman those count become 23 and 13 respectively. The problem becomes more challenging when handwritten documents are considered. In this paper an app...

متن کامل

A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion

Handwritten document image dataset development is one of the most tedious and time consuming tasks in optical character recogniser (OCR) related experimental work. Special attention need to be given in terms of feasibility, realness, clarity etc. while collecting real life data from different writers. Few efforts can be found in the literature for development of handwritten NIdb (numeral image ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012